xBRZ - the ultimate xBR style image upscaler/filter.
This is a collection of rpi-plugins for the Kega Fusion Emulator.
Windows only.
Modified code by "milo1012" (milo1012 AT freenet DOT de)

All original code by "Zenju" at the HqMAME project.
Take a look at
http://sourceforge.net/projects/hqmame
for sample pictures and more info.

xBRZ is in general more detail-preserving than xBR
and essentially better than the HQx filters!


Versions
---------

4xBRZ.rpi      4xBRZ - normal version - scales to e.g. 1280x960 (Mega Drive - PAL)
4xBRZ-MT.rpi   4xBRZ - (multi-)threaded version
3xBRZ.rpi      3xBRZ - normal version - scales to e.g. 960x720
3xBRZ-MT.rpi   3xBRZ - (multi-)threaded version
2xBRZ.rpi      2xBRZ - normal version - scales to e.g. 640x480
2xBRZ-MT.rpi   2xBRZ - (multi-)threaded version

The normal version does everything in a single thread, like most other plugins.
I recommend a recent-generation CPU with at least 3 GHz for the 4x version for
50/60 FPS.
The 3x and 2x version will probably run with much less CPU speed.

The threaded version scales the image in 4 slices, each in a separate thread,
which provides a decent speedup for all systems with at least two CPU cores,
especially for first generation Dual Core CPUs.
(Athlon 64 X2, first/2nd Intel Core gen. and similar)

I've tested the threaded version quite thorough,
but I can't guarantee that it's completely bug-free.
Using the plugin for VBA-M shows the threaded picture set-up, but that's the fault of
VBA-M since it doesn't wait until the function returns before displaying the frame.


Performance
-----------

On my main system (Intel Core i5 3200 MHz) the normal x4 version runs well,
with on older 2000 MHz Core 2 CPU I get about 30 FPS, but the threaded version
runs with full speed most of the time, with the exception of some detailed game
scenes which might bring the FPS to < 50 for a small amount of time.
A good example for that is the start of the Sonic 2 "Aquatic Ruin Zone"
(just fast forward the Sonic 2 automatic demo to see it).
Very much details in current picture/scene = low performance!

Why is it so slow?
First of all, it is a quite computing-heavy algorithm.
Second, Kega uses a 16 (or 15) bit color depth internally (prior to final output),
so the plugin must convert to RGB24, do the scaling and convert back to 16 bit.

If you have performance problems, especially in the 4x version, use the 3x or 2x
version and let Kega's internal filter do the remaining upscaling to your final
screen resolution (with slightly less good-looking result of course),
or just use the threaded version if you have an older Multi Core system.
Avoid the threaded version on Single Core systems, you won't get any speedup,
instead (more likely) even lower performance.

I compiled all versions with Visual C++ .NET 2003 SP1.
There is no SSE required.
Interestingly this produces tremendously faster code than newer versions, like 2008/10.
With a 2010 compile I get <50 FPS nearly all the time for the x4 normal plugin version
in my main system.
Only with enabled SSE/SSE2 instructions I get roughly equal performance,
but still not more as with the 2003 version w/o SSE!
Thank you MS for optimizing your compilers by removing decent non-SIMD code! LOL

Feel free to modify the source code to improve the speed for older systems.





